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Abstract 

We discuss a graph-based approach for testing spatial point patterns. This approach 
falls under the category of data-random graphs, which have been introduced and used 
for statistical pattern recognition in recent years. Our goal is to test complete spatial 
randomness against segregation and association between two or more classes of points. 
To attain this goal, we use a particular type of parametrized random digraph called 
proximity catch digraph (PCD) which is based based on relative positions of the data 
points from various classes. The statistic we employ is the relative density of the PCD. 
When scaled properly, the relative density of the PCD is a {/-statistic. We derive 
the asymptotic distribution of the relative density, using the standard central limit 
theory of [/-statistics. The finite sample performance of the test statistic is evaluated 
by Monte Carlo simulations, and the asymptotic performance is assessed via Pitman's 
asymptotic efficiency, thereby yielding the optimal parameters for testing. Furthermore, 
the methodology discussed in this article is also valid for data in multiple dimensions. 
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1 Introduction 



In this article, we discuss a graph-based approach for testing spatial point patterns. In 
statistical literature, the analysis of spatial point patterns in natural populations has been 
extensively studied and have important implications in epidemiology, population biology, 
and ecology. We investigate the patterns of one class with respect to other classes, rather 
than the pattern of one-class with respect to the ground. The spatial relationships among 
two or mor e grou p s have import ant i mplications es pecially for plant species. See, for exam- 
ple, Eiio3 Jl2S3), [Dbam] (mi, andlDixorJ tooi ). 

Our goal is to test the spatial pattern of complete spatial randomness against spatial 
segregation or association. Complete spatial randomness (CSR) is roughly defined as the 
lack of spatial interaction between the points in a given study area. Segregation is the 
pattern in which points of one class tend to cluster together, i.e., form one-class clumps. In 
association, the points of one class tend to occur more frequently around points from the 
other class. For convenience and generality, we call the different types of points as "classes" , 
but the class can be replaced by any characteristic of an observation at a particular lo cation. 
For ex ample, the pattern of sp atial segregation ha s been investigated for species (jDiggle 
(12003)), age classes of plants ( Hamill and Wright ( 19861 )) and sexes of dioecious plants 



(INanami et al 



ljl99fll )). 



We use special graphs called proximity c atch digraphs (PCD s) for testing CSR against 
segregation or association. In recent years, iPriebe et al.1 (|200ll ) introduced a random di- 
graph relate d to PCDs (called class c over catch digraphs) in R and e xtended it to multipl e 



dim ensions. IPeVinnev et al.l tooi ). iMarchette and Priebel tooi ). IPriebe et aD (j2003bl ). 



and IPriebe et al.l (|2003al ) demonstrated relatively good performance of it in classification. 
In this article, we define a new class of random digraphs (called PCDs) and apply it in 
testing against segregation or association. A PCD is comprised by a set of vertices and a 
set of (directed) edges. For example, in the two class case, with classes X and y, the X 
points are the vertices and there is an arc (directed edge) from x\ G X to x<i £ X, based 
on a binary relation which measures the relative allocation of x\ and x<i with respect to y 
points. By construction, in our PCDs, X points further away from y points will be more 
likely to have more arcs directed to other X points, compared to the X points closer to 
the y points. Thus, the relative density (number of arcs divided by the total number of 
possible arcs) is a reasonable statistic to apply to this problem. To illustrate our methods, 
we provide three artificial data sets, one for each pattern. These data sets are plotted in 
Figure [Q where y points are at the vertices of the triangles, and X points are depicted 
as squares. Observe that we only consider the X points in the convex hull of y points; 
since in the current form, our proposed methodology only works for such points. Hence we 
avoid using a real life example, but use these artificial pattern realizations for illustrative 
purposes. Under segregation (left) the relative density of our PCD will be larger compared 
to the CSR case (middle), while under association (right) the relative density will be smaller 
compared to the CSR case. 

The statistical tool we utilize is the asymptotic theory of [/-statistics. Properly scaled, 
we demonstrate that the relative density of our PCDs is a [/-statistic, which have asymptotic 
normality by the g eneral central limit theory of [/-statistics. The digraphs introduced by 
(120011 ). whose relative density is also of the [/-statistic form, the asymptotic 
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Priebe et al 



mean and variance of the relative density is not analytically tractable, due to geometric 
difficulties encountered. However, the PCD we introduce here is a parametrized family of 
random digraphs, whose relative density has tractable asymptotic mean and variance. 

Ceyhan and Priebe introdu ced an (unparametrized) v ersio n of this PCD and a nothe r 
parametrized family of PCDs in Cevhan and Priebe ( 20031 ) and Cevhan and Priebe ( 20051 ). 
respectively. Ceyhan and Priebe! ( 2005 ) used the domination number (which is another 
statistic based on the number of arcs from the vertices) of the second parametrized family for 
testing segregation and associatio n. The domination n umber approach is appropriate when 
both classes are comparably large. Ceyhan et al.1 ( 20061 ) used the relative density of the same 
PCD for testing the spatial patterns. The new parametrized family of PCDs we introduce 
has more geometric appeal, simpler in distributional parameters in the asymptotics, and 
the range of the parameters is bounded. 

Using the Delaunay triangulati on of the y observat i ons, w e will define the parametrized 
version of the proximity maps of Cevhan and Priebe ( 20031 ) in Section 13.11 for which the 
calculations — regarding the distribution of the relative density — are tractable. We then 
can use the relative density of the digraph to construct a test of complete spatial randomness 
against the alternatives of segregation or association which are defined explicitly in Sections 
[2] and 14.11 We will calculate the asymptotic distribution of the relative density for these 
digraphs, under both the null distribution and the alternatives in Sections 14.21 and 14.31 
respectively. This procedure results in a consistent test, as will be shown in Section 15.11 
The finite sample behaviour (in terms of power) is analyzed using Monte Carlo simulation 
in Section [521 The Pitman asymptotic efficiency is analyzed in Section ^. 2. 31 The multiple- 
triangle case is presented in Section 15.31 and the extension to higher dimensions is presented 
in Section 15. 4[ All proofs are provided in the Appendix. 
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Figure 1: Realizations of segregation (left), CSR (middle), and association (right) for 
\y\ = 10 and \X\ = 1000. Y points are at the vertices of the triangles, and X points are 
squares. 
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2 Spatial Point Patterns 



For simplicity, we describe the spatial point patterns for two-class populations. The null 
hypothesis for spatia l patte rns have been a controversial topic in ecology from the early days 
fGo telli and Graved ljl99fil )). But in general, the null hypothesis consists of two random 



pattern types: complete spatial randomness or random labeling. 

Under complete spatial randomness (CSR) for a spatial point pattern {X(D),A(D) £ 
M 2 } where A(-) denotes the area functional, we have 

(i) given n events in domain D, the events are an independent random sample from a 
uniform distribution on D; 

(ii) there is no spatial interaction. 

Furthermore, the number of events in any planar region with area A{D) follows a Poisson 
distribution with mean A • A(D), whose probability mass function is given by 

/jt(D)M - £^W , , e( o,M,... } 

where A is the intensity of the Poisson distribution. 

Under random labeling, class labels are assigned to a fixed set of points randomly so 
that the labels are independent of the locations. Thus, random labeling is less restrictive 
than CSR. But conditional on a set of points from CSR, both processes are equivalent. We 
only consider a special case of CSR as our null hypothesis in this article. That is, only X 
points are assumed to be uniformly distributed over the convex hull of Y points. 

The alternative patterns fall under two major categories called association and segrega- 
tion. Association occurs if the points from the two classes together form clumps or clusters. 
That is, association occurs when members of one class have a tendency to attract members 
of the other class, as in symbiotic species, so that the Xi will tend to cluster around the 
elements of y. For example, in plant biology, X points might be parasitic plants exploiting 
y points. As another example, X and y points might represent mutualistic plant species, 
so they depend on each other to survive. In epidemiology, y points might be contaminant 
sources, such as a nuclear reactor, or a factory emitting toxic gases, and X points might be 
the residence of cases (incidences) of certain diseases caused by the contaminant, e.g., some 
type of cancer. Seg regation occur s if the members of the same class tend to be clumped or 
clustered (see, e.g., IPieloul (jl96lh . Many different forms of segregation are possible. Our 



methods will be useful only for the segregation patterns in which the two classes more or 
less share the same support (habitat), and members of one class have a tendency to repel 
members of the other class. For instance, it may be the case that one type of plant does 
not grow well in the vicinity of another type of plant, and vice versa. This implies, in our 
notation, that Xj are unlikely to b e located near any elements of y. See, for instance, 
( Coomes et al. ( 19991 ): Dixon (1994)). In plant biology, y points might represent a tree 



species with a large canopy, so that, other plants (X points) that need light cannot grow 
around these trees. As another interesting but contrived example, consider the arsonist 
who wishes to start fires with maximum duration time (hence maximum damage), so that 
he starts the fires at the furthest points possible from fire houses in a city. Then y points 
could be the fire houses, while X points will be the locations of arson cases. 



We consider completely mapped data, i.e., the locations of all events in a denned space 
are observed rather than sparsely sampled data (only a random subset of locations are 
observed). 



3 Data-Random Proximity Catch Digraphs 

In general, in a random digraph, there is an arc between two vertices, with a fixed proba- 
bility, independent of other arcs and vertex pairs. However, in our approach, arcs with a 
shared vertex will be dependent. Hence the name data-random digraphs. 

Let (Q, A4) be a measurable space and consider a function N : Q x 2 n — > 2 , where 2 
represents the power set of f2. Then given y C £1, the proximity map Ny(-) = N(-,y) : 
S7 — > 2^ associates a proximity region Ny(x) C with each point The region Ny(x) 

is defined in terms of the distance between x and y. 

If X n := {X\, X2, ■ ■ ■ , X n } is a set of fi-valued random variables, then the Ny(Xi), i = 
1, • • • , n, are random sets. If the Xi are independent and identically distributed, then so 
are the random sets, Ny(Xi). 

Define the data-random proximity catch digraph D with vertex set V = {Xi, ■ ■ ■ ,X n } 
and arc set A by (Xi,Xj) £ A <J=^ Xj £ Ny(Xi) where point X{ "catches" point Xj. The 
random digraph D depends on the (joint) distribution of the Xi and on the map Ny. The 
adjective proximity — for the catch digraph D and for the map Ny — comes from thin king 
of the region Ny(x) as represen ting those points in $7 "close" to x ( Toussaint ( 1980l ) and 



Jaromczvk and Toussaintl ()1992l )). 

The relative density of a digraph D = (V,A) of order |V| = n (i.e., number of vertices 
is n), denoted p(D), is defined as 



n{n — 1) 

where | • | denotes the set cardinality functional ( Janson et al. ( 200Cll )). 



Thus p(D) represents the ratio of the number of arcs in the digraph D to the number 
of arcs in the complete symmetric digraph of order n, namely n(n — 1). 

If Xi, ■ ■ ■ ,X n *~ F, then the relative density of the associated data-random proximity 
catch digraph D, denoted p(X n ; h, Ny), is a U-statistic, 

p{X n ; h, Ny) = \ E KX^XfNy) (1) 

where 

h(Xi, Xf, Ny) = I{(X i} Xj) GA} + I{(Xj,Xi) G A} 

= liXjENyiXift + IiXieNyiXj)} (2) 

with I(-) being the indicator function. We denote h(X{, Xj; Ny) as hij henceforth for brevity 
of notation. Although the digraph is not symmetric (since (x, y) G A does not necessarily 
imply (y, x) £ A), is defined as the number of arcs in D between vertic es Xi and Xj, in 
order to produce a symmetric kernel with finite variance dLehmannl (jl98«l )). 



The random variable p n := p(X n ; h, Ny) depends on n and Ny explicitly and on F 
implicitly. The expectation E[/? n ], however, is independent of n and depends on only F 
and Ny. 

< E [p n ] = -E [h 12 ] < 1 for all n > 2. (3) 

The variance Var [p n ] simplifies to 

1 n — 2 

< Var [p n ] = — -Var [h 12 ] + -Cov [h 12 , his] < 1/4. (4) 

ln\n — 1) n(n — 1) 



A central limit theorem for [/-statistics ( Lehmann (1988)) yield 



v^(Pn - E [p n ]) Af(0, Cov [h u , his]) (5) 

provided that Cov [hi 2 , his] > 0. The asymptotic variance of p n , Cov [h± 2 , his], depends 
on only F and Ny. Thus, we need determine only E [hi 2 ] and Cov [hi 2 , his] m order to 
obtain the normal approximation 

approx fE[h u ] Cov [h X2 , his] \ , , ,„>. 
p n ~ N (E[p n ],Var [p n ]) =N I — - — , J for large n. (6) 

3.1 The r-Factor Central Similarity Proximity Catch Digraphs 

We define the r-factor central similarity proximity map briefly. Let = R 2 and let y = 
{yi,y2,y3} C M 2 be three non-collinear points. Denote the triangle — including the interior 
- formed by the points in y as T(y). For r G [0, 1], define Ny to be the r-factor central 
similarity proximity map as follows; see also Figure [2l Let ej be the edge opposite vertex 
Yj for j = 1,2,3, and let "edge regions" R(ei), R{e 2 ), R(es) partition T(y) using segments 
from the center of mass of T(y) to the vertices. For x € T(y) \ y, let e{x) be the edge 
in whose region x falls; x G R(e(x)). If x falls on the boundary of two edge regions 
we assign e(x) arbitrarily. For r G (0,1], the r-factor central similarity proximity region 
Nq S (x) = Ny(x) is defined to be the triangle T T (x) with the following properties: 

(i) T T (x) has an edge e T (x) parallel to e(x) such that d(x,e T (x)) = rd(x,e(x)) and 
d(e T (x),e(x)) < d(x,e(x)) where d(x,e(x)) is the Euclidean (perpendicular) distance 
from x to e(x), 

(ii) T T (x) has the same orientation as and is similar to T(y), 

(iii) x is at the center of mass of T T (x). 

Note that (i) implies the "r-factor", (ii) implies "similarity", and (iii) implies "central" 

in the name, r-factor central similarity proximity map. Notice that r > implies that 

x G N^, s (x) and r < 1 implies that N£ s (x) C T(y) for all x G T(y). For x G d(TQ?)) and 

r G [0, 1], we define Nq S (x) = {x}; for r = and x G T(J^) we also define Nq S (x) = {x}. 

Let T(y)° be the interior of the triangle T(y). Then for all x G T(y)° the edges e T (x) 

and e(x) are coincident iff r = 1. Observe that the central similarity proximity map in 
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1/2 

Figure 2: Construction of r-factor central similarity proximity region, N cs (x) (shaded 
region) . 



(jCevhan and Priebel (|2003l )) is Nq S (-) with t = 1. Hence by definition, is an arc of 



CS 

the r-factor central similarity PCD iff y £ N£, s (x). 

Notice that X{ *~ -F, with the additional assumption that the non-degenerate two- 
dimensional probability density function / exists with support in T(y), implies that the 
special case in the construction of N^ s — X falls on the boundary of two edge regions - 
occurs with probability zero. 

For a fixed r € (0,1], Nq S (x) gets larger (in area) as x gets further away from the 
edges (or equivalently gets closer to the center of mass, Cm) in the sense that as d(x,e(x)) 
increases (or equivalently d(CM,e T (x)) decreases. Hence for points in T(y), the further 
the points away from the vertices y (or closer the points to Cm in the above sense), the 
larger the area of Nq S (x). Hence, it is more likely for such points to catch other points, i.e., 
have more arcs directed to other points. Therefore, if more X points are clustered around 
the center of mass, then the digraph is more likely to have more arcs, hence larger relative 
density. So, under segregation, relative density is expected to be larger than that in CSR or 
association. On the other hand, in the case of association, i.e., when X points are clustered 
around Y points, the regions Nq S (x) tend to be smaller in area, hence, catch less points, 
thereby resulting in a small number of arcs, or a smaller relative density compared to CSR 
or segregation. See, for example, Figure [3] with 3 Y points, and 20 X points for segregation 
(top left), CSR (middle left) and association (bottom right). The corresponding arcs in 
the r-factor central similarity PCD with r = 1 are plotted in the right in Figure El The 
corresponding relative density values (for r = 1) are .1395, .2579, and .0974, respectively. 

Furthermore, for a fixed x E T(y)°, N£ s (x) gets larger (in area) as r increases. So, 
as r increases, it is more likely to have more arcs, hence larger relative density for a given 
realization of X points in T(y). 
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Figure 3: Realizations of segregation (left), CSR (middle), and association (right) for 
\y\ = 3 and \X\ = 20. Y points are at the vertices of the triangle, and X points are squares. 
The number of arcs with r = 1 are 98, 53, and 37, respectively. So, relative density values 
are .258, .139, and .097, respectively. 
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4 Asymptotic Distribution of the Relative Density 



We first describe the null and alternative patterns we consider briefly, and then provide the 
asymptotic distribution of the relative density for these patterns. 

There are two major types of asymptotic structures for spatial data ( Lahiri ( 19961 )). In 
the first, any two observations are required to be at least a fixed distance apart, hence as 
the number of observations increase, the region on which the process is observed eventually 
becomes unbounded. This type of sampling structure is called "increasing domain asymp- 
totics". In the second type, the region of interest is a fixed bounded region and more and 
more points are observed in this region. Hence the minimum distance between data points 
tends to zero as the sample size t ends to infinity. This type of structure is called "infill 
asymptotics" , due to Cressie ( 199 ll ). The sampling structure for our asymptotic analysis is 
infill, as only the size of the type X process tends to infinity, while the support, the convex 
hull of a given set of points from type Y process, Cn{y) is a fixed bounded region. 



4.1 Null and Alternative Patterns 

For statistical testing for segregation and association, the null hypothesis is generally some 
form of complete spatial randomness; thus we consider 

H :Xi~U(T(y)). 

If it is desired to have the sample size be a random variable, we may consider a spatial 
Poisson point process on T(y) as our null hypothesis. 



Geometry Invariance Property 

We first present a "geometry invariance" result that will simplify our calculations by allowing 
us to consider the special case of the equilateral triangle. 

Theorem 1: Let y = {yi,y2 5 Y3} C K 2 be three non-collinear points. For i = 1, • • • , n 
let Xi *~ F = U(T(y)), the uniform distribution on the triangle T(y). Then for any 
r G [0, 1] the distribution of /0 n (r) := p(X n ; h, N^ s ) is independent of 3^, hence the geometry 

of T(y). 

Based on Theorem 1 and our uniform null hypothesis, we may assume that T(y) is 
the standard equilateral triangle with 3* = {(0, 0), (1, 0), (1/2, \/3/2) }, henceforth. For our 
r-factor central similarity proximity map and uniform null hypothesis, the asymptotic null 
distribution of p n (r) = p(X n ; h, N^ s ) as a function of r can be derived. Let //(r) := E [p n ], 
then fi(r) = E [/ti2]/2 = P(X2 G N£ s (Xi)) is the probability of an arc occurring between 
any two vertices and let v{t) := Cov [/112, h\$\. 

We define two simple classes of alternatives, and with e G (0, y/3/3), for seg- 
regation and association, respectively. See also Figure |H For y G y, let e(y) denote 
the edge of T(y) opposite vertex y, and for x G T(y) let £ y (x) denote the line parallel 
to e(y) through x. Then define T(y,e) = {x G T(y) : d(y,l y (x)) < e}. Let H$ be 

the model under which Xi *~ U (r{y) \ Uy G j;T(y, e)J and Hf be the model under which 

Xi ~ W(u yey r(y, VS/3-e)). The shaded region in Figure U] is the support for segregation 
V ' 9 




Figure 4: An example for the segregation alternative for a particular s (shaded region), 
and its complement is for the association alternative (unshaded region) on the standard 
equilateral triangle. 



for a particular e value; and its complement is the support for the association alternative 
with y/3/3 — e. Thus the segregation model excludes the possibility of any X{ occurring 
near a y,-, and the association model requires that all X{ occur near a yj. The V3/3 — e in 
the definition of the association alternative is so that e = yields H Q under both classes of 
alternatives. We consider these types of alternatives among many other possibilities, since 
relative density is geometry invariant for these alternatives as the alternatives are defined 
with parallel lines to the edges. 

Remark: These definitions of the alternatives are given for the standard equilateral 
triangle. The geometry invariance result of Theorem 1 from Section 14.11 still holds under 
the alternatives, in the following sense. If, in an arbitrary triangle, a small percentage 
6 ■ 100% where 5 £ (0,4/9) of the area is carved away as forbidden from each vertex 
using line segments parallel to the opposite edge, then under the transformation to the 
standard equilateral triangle this will result in the alternative H^^j^. This argument is 

for segregation with 5 < 1/4; a similar construction is available for the other cases. 
4.2 Asymptotic Normality Under the Null Hypothesis 

By detailed geometric probability calculations provided in the Appendix, the mean and 
the asymptotic variance of the relative density of the r-factor proximity catch digraph can 
be calculated explicitly. The central limit theorem for [/-statistics then establishes the 
asymptotic normality under the uniform null hypothesis. These results are summarized in 
the following theorem. 
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Theorem 2: For r G (0, 1], the relative density of the r-factor central similarity prox- 
imity digraph converges in law to the normal distribution, i.e., as n — > oo, 



Vn(pn(T) - /i(r)) 



Af(0, 1) 



(7) 



where 



/i(r) = r 2 /6 



(8) 



and 



v{t) 



r 4 (6 r 5 - 3 t 4 - 25 r 3 + t 2 + 49 r + 14) 
45 (r + l)(2r + l)(r + 2) 



0) 



For r = 0, p n { T ) is degenerate for all ra > 1. 
See the Appendix for the derivation. 

Consider the form of the mean and the variance functions, which are depicted in Figure 
[5j Note that p{r) is monotonically increasing in r, since N^ s {x) increases with r for all 
x £ T(y)°. Note also that /x(r) is continuous in r with /i(r = 1) = 1/6 and /j,(t = 0) = 0. 

Regarding the asymptotic variance, note that z/(r) is continuous in r and v{t = 1) = 
7/135 and v{r = 0) = — there are no arcs when r = a.s. — which explains why p n {r = 0) 
is degenerate. 



ml 



0.2 0.4 
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Figure 5: Result of Theorem 2: asymptotic null mean /x(r) = /x(r) (left) and variance 
v{t) = v{t) (right), from Equations ([8]) and Q, respectively. 

As an example of the limiting distribution, r = 1/2 yields 



V^(p„(l/2) - //(1/2)) _ /2880 



7KV2) 



i n 



19 



A (l/2) - 1/24) -^JV(0,1) 



or equivalently, 



MV2) ap ^ x AA(l,. 



24 ' 2880 n 
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The finite sample variance and skewness may be derived analytically in much the same 
way as was Cov [h\2, /113] for the asymptotic variance. In fact, the exact distribution of 
Pn(r) is, in principle, available by successively conditioning on the values of the Xj. Alas, 
while the joint distribution of /ii2,^i3 is available, the joint distribution of {/iy}i<j<j<n> 
and hence the calculation for the exact distribution of p n ( T ), is extraordinarily tedious and 
lengthy for even small values of n. 

Figure [6] indicates that, for r = 1/2, the normal approximation is accurate even for small 
n (although kurtosis and skewness may be indicated for n = 10, 20). Figure [7] demonstrates, 
however, that the smaller the value of r the more severe the skewness of the probability 
density. 




Figure 6: Depicted are p„(l/2) ~ -^(m'^sM^) for n = 10 ' 20 ' 100 ( left to right)- 
Histograms are based on 1000 Monte Carlo replicates. Solid curves represent the approx- 
imating normal densities given in Theorem 2. Note that the vertical axes are differently 
scaled. 




Figure 7: Depicted are the histograms for 10000 Monte Carlo replicates of pio(l/4) (left), 
pio(3/4) (middle), and /9io(l) (right) indicating severe small sample skewness for small 
values of r. 
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4.3 Asymptotic Normality Under the Alternatives 

Asymptotic normality of the relative density of the proximity catch digraph can be estab- 
lished under the alternative hypotheses of segregation and association by the same method 
as under the null hypothesis. Let E e [-] be the expectation with respect to the uniform 
distribution under the segregation and association alternatives with e G (0, v3/3) . 

Theorem 3: Let /is(r, e) ( pa{t, e) ) be the mean and vs(r,e) ( va(t,e) ) be the 
covariance, Cov [h\2, ^13] for r G (0,1] and e G (0, \/3/3) under segregation ( association 

). Then under H^, \fn(p n {r) — jUs(r, £)) — > A/"(0, ^(r, e)) for the values of the pair 
(t, e) for which 1/5 (r, e) > 0. /9 n (7") is degenerate when us(r,e) = 0. Likewise, under 

iJ^, ^/n( K p n {T) — ha{t,e)) — > A/"(0, z^a(t, e)) for the values of the pair (r, e) for which 
va{t,e) > 0. p n (r) is degenerate when ^(r, e) = 0. 

Notice that under the association alternatives any r € (0, 1] yields asymptotic normality 
for all e G (0, \/3/3) , while under the segregation alternatives only r = 1 yields this universal 
asymptotic normality. 



5 The Test and Analysis 

The relative density of the central similarity proximity catch digraph is a test statistic for 
the segregation/association alternative; rejecting for extreme values of p n (r) is appropriate 
since under segregation, we expect p n (i~) to be large; while under association, we expect 
p n {j) to be small. Using the test statistic 

Rir) = M^ry-MT^ 

vn 7 ") 

which is the normalized relative density, the asymptotic critical value for the one-sided level 
a test against segregation is given by 

z a = <£ _1 (1 - a). 

Against segregation, the test rejects for R(t) > z a and against association, the test rejects 
for R(t) < Zi-a- For the example patterns in Figure (3j R(r = 1) = 1.792,— .534, and 
-1.361, respectively. 



5.1 Consistency of the Tests Under the Alternatives 

Theorem 4: The test against which rejects for R(r) > z a and the test against Hf 
which rejects for R(r) < z\- a are consistent for r G (0, 1] and e G (0, \/3/3). 

In fact, the analysis of the means under the alternatives reveals more than what is 
required for consistency. Under segregation, the analysis indicates that ps{ T -> £1) < I^s(t, £2) 
for £\ < £2- On the other hand, under association, the analysis indicates that pa{t,£\) > 
Pa(t,£ 2 ) for £1 < e 2 . 
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5.2 Monte Carlo Power Analysis 



In this section, we asses the finite sample behaviour of the relative density using Monte 
Carlo simulations for testing CSR against segregation or association. We provide the kernel 
density estimates, empirical significance levels, and empirical power estimates under the 
null case and various segregation and association alternatives. 

5.2.1 Monte Carlo Power Analysis for Segregation Alternatives 

In Figures [8] and [U we present the kernel density estimates under H a and Hf with e = 
V3/8, a/3/4, 2 y/3/7. Observe that with n = 10, and e = y/3/8, the density estimates are 
very similar implying small power; and as e gets larger, the separation between the null and 
alternative curves gets larger, hence the power gets larger. With n = 10, 10000 Monte Carlo 
replicates yield power estimates /?^ c (e) = .0994, .9777, 1.000, respectively. With n = 100, 
there is more separation between the null and alternative curves at each e, which implies 
that power increases as e increases. With n = 100, 1000 Monte Carlo replicates yield 
P^Je) = .5444, 1.000, 1.000. 




Figure 8: Kernel density estimates for the null (solid) and the segregation alternative 
(dashed) with r = 1/2, n = 10, N = 10000, and e = V3/8 (left), e = V3/4 (middle), and 
e = 2^/7 (right). 




relative density 



relative density 



Figure 9: Kernel density estimates for the null (solid) and the segregation alternative H s 



(dashed) for r = 1/2 with n = 10 and iV 



10000 (left) and n 
14 



100 and N 



1000 (right). 



For a given alternative and sample size, we may consider analyzing the power of the test 
— using the asymptotic critical value (i.e., the normal approximation) — as a function of r. 
Figure [10] presents a Monte Carlo investigation of power against if^, g , ^^3/4 as a f unc tion 
of r for n = 10. The corresponding empirical significance levels and power estimates are 
presented in Table [2] The empirical significance levels, a n =lO: are an greater than .05 
with smallest being .0868 at r = 1.0 which have the empirical power /3io(\/3/8) = .2289, 
/3io(\ // 3/4) = .9969. However, the empirical significance levels imply that n = 10 is not large 
enough for normal approximation. Notice that as n gets larger, the empirical significance 
levels gets closer to .05 (except for r = 0.1), but still are all greater than .05, which indicates 
that for n < 100, the test is liberal in rejecting H Q against segregation. Furthermore, as 
n increases, for fixed e the empirical power estimates increase, the empirical significance 
levels get closer to .05; and for fixed n as r increases power estimates get larger. Therefore, 
for segregation, we recommend the use of large r values (r < 1.0). 




Figure 10: Monte Carlo power using the asymptotic critical value against segregation 
alternatives H^/ 8 (left), (right), as a function of r, for n = 10 and N = 10000. 

The circles represent the empirical significance levels while triangles represent the empirical 
power values. 



5.2.2 Monte Carlo Power Analysis for Association Alternatives 

In Figures [11] and [T2l we present the kernel density estimates under H Q and with 
e = \/3/21, \/3/12, 5 \/3/24. Observe that with n = 10, the density estimates are very 
similar for all e values (with slightly more separation for larger e) implying small power. 
10000 Monte Carlo replicates yield power estimates /?^ c 0. With n = 100, there is 
more separation between the null and alternative curves at each e, which implies that 
power increases as e increases. 1000 Monte Carlo replicates yield /?^ c = .324, .634, .634, 
respectively. 

For a given alternative and sample size, we may consider analyzing the power of the test 
— using the asymptotic critical value — as a function of r. 

The empirical significance levels and power estimates against H^, with e = y/3/12, 5 \/3/24 
as a function of r for n = 10 are presented in Tabled The empirical significance level closest 



r 


.1 


.2 


.3 


.4 


.5 


.6 


.7 


.8 


.9 


1.0 


n = 10, N = 10000 


Ss(ra) 


.0932 


.1916 


.1740 


.1533 


.1101 


.0979 


.1035 


.0945 


.0883 


.0868 


^(r,V3/8) 


.1286 


.2630 


.2917 


.2811 


.2305 


.2342 


.2526 


.2405 


.2334 


.2289 


^(r,x/3/4) 


.5821 


.9011 


.9824 


.9945 


.9967 


.9979 


.9990 


.9985 


.9983 


.9969 


n = 20, AT = 10000 


Ss(ra) 


.2018 


.1707 


.1151 


.1099 


.0898 


.0864 


.0866 


.0800 


.0786 


.0763 


^(t,V3/8) 


.2931 


.3245 


.2744 


.3021 


.2844 


.2926 


.3117 


.3113 


.3119 


.3038 


n = 100, N = 1000 


asf(n) 


.155 


.101 


.080 


.077 


.075 


.066 


.065 


.063 


.066 


.069 




.574 


.574 


.612 


.655 


.709 


.742 


.774 


.786 


.793 


.793 



Table 1: The empirical significance level and empirical power values under for e = 
>/3/8, V3/4 at a = .05. 
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Figure 11: Kernel density estimates for the null (solid) and the association alternative Hf- 
(dashed) for r = 1/2 with n = 10, N = 10000 and e = V3/21 (left), e = ^3/12 (middle), 
e = 5^/3/24 (right). 

to .05 occurs at r = .6, (much smaller for other r values) which have the empirical power 
/?io(v / 3/12) = .1181, and /?io(5 v3/24) = .1187. However, the empirical significance levels 
imply that n = 10 is not large enough for normal approximation. With n = 20, the empiri- 
cal significance levels gets closer to .05 for r = .3, .4, .5, .7, .8, .9, 1.0, with closest at r = .4 
which has the empirical power .1497. With n = 100, the empirical significance levels are 
« .05 for r > .3 and the highest empirical power is .997 at r = 1.0. Note that as n increases, 
the empirical power estimates increase for r > .2 and the empirical significance levels get 
closer to .05 for r > .5. This analysis indicate that in the one triangle case, the sample size 
should be really large (n > 100) for the normal approximation to be appropriate. Moreover, 
the smaller the r value, the larger the sample needed for the normal approximation to be 
appropriate. Therefore, we recommend the use of large r values (r < 1.0) for association. 
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relative density 



relative density 



Figure 12: Kernel density estimates for the null (solid) and the association alternative 
(dashed) for r = 1/2 with n = 100, N = 1000 and e = V3/21 (left), e = y/3/12 (right). 



r 


.1 


.2 


.3 


.4 


.5 .6 


.7 .8 .9 


1.0 


n = 10, N = 10000 


a A (n) 

















.0465 


.0164 


.0223 


.0209 


.0339 


P£(t,V3/12) 

















.1181 


.0569 


.0831 


.0882 


.1490 


^(r,5V3/24) 

















.1187 


.0581 


.0863 


.0985 


.1771 


n = 20, JV = 10000 


a A (n) 


.6603 


.2203 


.1069 


.0496 


.0338 


.0301 


.0290 


.0267 


.0333 


.0372 




.7398 


.3326 


.2154 


.1497 


.1442 


.1608 


.1818 


.2084 


.2663 


.3167 


n = 100, N = 1000 


3,4 (n) 


.169 


.075 


.053 


.047 


.049 


.044 


.040 


.044 


.049 


.049 


fl*(r, v/3/12) 


.433 


.399 


.460 


.559 


.687 


.789 


.887 


.938 


.977 


.997 



Table 2: The empirical significance level and empirical power values under Hjr for e 
5 v^/24, V3/12, y/3/21 with N = 10000, and n = 10 at a = .05. 



5.2.3 Pitman Asymptotic Efficiency Under the Alternatives 

Pitman asymptotic efficiency (PAE) provides for an investigation of "local asymptotic 
power" — local around H Q . This involves the limit as n — > oo, as well as the limit as 
e — > 0. See proof of Theorem 3 for the ranges of r and e for which rela tive density is 
continuous as n goes to o o. A detailed discussion of PAE can be found in (jEedenl (jl963h : 
Kendall and Stuartl ([19791 ) . For segregation or association alternatives the PAE is given by 
W( T , £ =0)) 2 



PAE( /0n (r)) 



where k is the minimum order of the derivative with respect 



to e for which ^ k \r,e = 0) / 0. That is, ^ k \r,e = 0) / but ^ l \r,e = 0) = for 
I = 1,2, . . . ,k — 1. Then under segregation alternative and association alternative H^, 
the PAE of pnij) is given by 



PAE s (r) 



and PAE A (r) 



(^(r, £ = 0)) 2 



v(t) u(t) 

respectively, since /j,' s (t,e = 0) = fi' A (r,e = 0) = 0. Equation Q provides the denominator; 
the numerator requires (Ms (t, e) and fj, A ( T i e ) J^hich are provided in the Appendix, where we 



only use the intervals of r that do not vanish as e — ► 0. 

In Figure [T3l we present the PAE as a function of r for both segregation and association. 

Notice that lim r ^ PAE 5 (r) = 320/7 « 45.7143, argsup rg(01] PAE 5 (r) = 1.0, and 
PAE s (t = 1) = 960/7 ~ 137.1429. Based on the PAE analysis, we suggest, for large n 
and small e, choosing r large (i.e., t = 1) for testing against segregation. 

Notice that lim r ^ PAE A (r) = 72000/7 « 10285.7143, PAE A (r = 1) = 61440/7 » 
8777.1429, arginf Tg(01] PAE j4 (r) » .4566 with PAE a (t « .4566) » 6191.0939. Based on 
the asymptotic efficiency analysis, we suggest, for large n and small e, choosing r small 
for testing against association. However, for small and moderate values of n the normal 
approximation is not appropriate due to the skewness in the density of p n {j)- Therefore, 
for small and moderate n, we suggest large r values (r < 1). 




Figure 13: Pitman asymptotic efficiency against segregation (left) and association (right) 
as a function of r. 

5.3 The Case with Multiple Delaunay Triangles 

Suppose y is a finite collection of points in R 2 with \y\ > 3. Consider the Delaunay 
triangulation (assumed to exist) of y, where Tj denotes the j th Delaunay triangle, J denotes 
the number of triangles, and Cn(y) denotes the convex hull of y. We wish to investigate 

H a : Xi *~ U(CH{y)) against segregation and association alternatives. 

Figure [U is the graph of realizations of n = 1000 observations which are independent 
and identically distributed according to W(C/r(3^)) for |^| = 10 and J = 13 and under 
segregation and association for the same y. 

The digraph D is constructed using N^ S (J,-) = Ny.(-) as described above, where for 
Xi G Tj the three points in y defining the Delaunay triangle Tj are used as 3^- Letting 
Wj = A(Ta) / A(Cij(y)) with A(-) being the area functional, we obtain the following as a 
corollary to Theorem 2. 

Corollary 1: The asymptotic null distribution for p n (r, J) conditional on W = {u>i, . . . , wj} 
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for t £ (0, 1] is given by Af(p(r, J), u(t, J)/n) provided that u(t, J) > with 



J 


J 








p(r,J) := h(t) J2 w2 j 


and v{t, J) := i/(r) V] Wj + 4 /x(r) 2 


x>?- 




,(H) 


i=i 











where p{r) and z^(t) are given by Equations (JSj) and ([9]), respectively. 



By an appropriate application of Jensen's inequality, we see that Y2j=i w j — [Ylj=i w j 

2 



2 



Therefore, the covariance v(t, J) = iff both v{r) = and X^/=i w ] = f X)/=i w |j hold, 
so asymptotic normality may hold even when i/(r) = (provided that /x(t) > 0). 

Similarly, for the segregation (association) alternatives where 4e 2 /3 • 100% of the area 
around the vertices of each triangle is forbidden (allowed), we obtain the above asymptotic 
distribution of p n (r, J) with /i(r, J) being replaced by //,s(t, J, s), v(r, J) by us(t, J, e), p(r) 
by Hs(t, e), and i/(r) by us{j,e). Likewise for association. 

Thus in the case of J > 1, we have a (conditional) test of H a : *~ U{Cn{y)) 
which once again rejects against segregation for large values of p n iji J) and rejects against 
association for small values of p n (r, J). 

The segregation (with 5 = 1/16, i.e., e = v3/8), null, and association (with 5 = 1/4, i.e., 
e = \/3/12) realizations (from left to right) are depicted in Figure [1] with n = 1000. For the 
null realization, the p-value p > .34 for all r values relative to the segregation alternative, 
also p > .32 for all r values relative to the association alternative. For the segregation 
realization, we obtain p < .021 for all r > .2. For the association realization, we obtain 
p < .02 for all r > .2 and p = .07 at r = .1. Note that this is only for one realization of X n . 

We repeat the null and alternative realizations 1000 times with n = 100 and n = 500 and 
estimate the significance levels and empirical power. The estimated values are presented in 
TableO With n = 100, the empirical significance levels are all greater than .05 and less than 
.10 for r > .6 against both alternatives, much larger for other values. This analysis suggests 
that n = 100 is not large enough for normal approximation. With n = 500, the empirical 
significance levels are around .1 for .3 < r < .5 for segregation, and around — but slightly 
larger than — .05 for r > .5. Based on this analysis, we see that, against segregation, our 
test is liberal — less liberal for larger r — in rejecting H Q for small and moderate n, against 
association it is slightly liberal for small and moderate n, and large r values. For both 
alternatives, we suggest the use of large r values. Observe that the poor performance of 
relative density in one-triangle case for association does not persist in multiple triangle case. 
In fact, for the multiple triangle case, R{t) gets to be more appropriate for testing against 
association compared to testing against segregation. 

The conditional test presented here is appropriate when Wj E W are fixed, not random. 
An unconditional version requires the joint distribution of the number and relative size 
of Delaunay triangles when y is, for ins t ance, a Poisson point pattern. Alas, this joint 
distribution is not available ( Qkabe et al. ( 2000l )). 



5.3.1 Pitman Asymptotic Efficiency Analysis for Multiple Triangle Case 

The PAE analysis is given for J = 1. For J > 1, the analysis will depend on both the 
number of triangles as well as the sizes ol-^ 16 triangles. So the optimal r values with 



r 


.1 


.2 


.3 


.4 


.5 


.6 


.7 


.8 


.9 


1.0 


n = 100, N = 1000, J = 13 


as(n,J) 


.496 


.366 


.302 


.242 


.190 


.103 


.102 


.092 


.095 


.091 


/f(r, v/3/8,J) 


.393 


.429 


.464 


.512 


.551 


.578 


.608 


.613 


.611 


.604 


a A (n, J) 


.726 


.452 


.322 


.310 


.194 


.097 


.081 


.072 


.063 


.067 


#*(r, v/3/12, J) 


.452 


.426 


.443 


.555 


.567 


.667 


.721 


.809 


.857 


.906 


n = 500, JV = 1000, J = 13 


a s (n,J) 


0.246 


0.162 


0.114 


0.103 


0.097 


0.092 


0.095 


0.093 


0.095 


0.090 


(r, V3/8, J) 


0.829 


0.947 


0.982 


0.988 


0.995 


0.995 


0.997 


0.998 


0.997 


0.997 


J) 


0.255 


0.117 


0.077 


0.067 


0.052 


0.059 


0.061 


0.054 


0.056 


0.058 


flftr, >/3/12,J) 


0.684 


0.872 


0.953 


0.991 


0.999 


1.000 


1.000 


1.000 


1.000 


1.000 



Table 3: The empirical significance level and empirical power values under H^^ g and 
H %/12> N = 1000 ' n = 100 ' and ^ = 13, at a = .05 for the realization of y in Figure [Q 



respect to these efficiency criteria for J = 1 are not necessarily optimal for J > 1, so the 
analyses need to be updated, conditional on the values of J and W. 
Under the segregation alternative Hf, the PAE of p u {t) is given by 



PAES(r) = « r ' J < £ = 0)) 



,2 



Under association alternative -f/^ 4 the PAE of p n (T~) is similar. 

The PAE curves for J = 13 (as in Figure [1]) are similar to the ones for the J = 
1 case (See Figures 1 1 3f) hence are omitted. Some values of note are lim T ^ PAEf(r) w 
38.1954, argsup re(01] PAE5(r) = 1 with PAE5(t = 1) ss 100.7740. As for association, 
linv^oPAE^-r) « 8593.9734, PAE^(t = 1) « 6449.5356, arginf Tg(0il] PAE^(r) w .4948 
with PAEj(r ~ .4948) ~ 5024.2236. Based on the Pitman asymptotic efficiency analysis, 
we suggest, for large n and small e, choosing large r for testing against segregation and 
small t against association. However, for moderate and small n, we suggest large r values 
for association due to the skewness of the density of p n (T~). 



5.4 Extension to Higher Dimensions 

The extension of N^ s to M d for d > 2 is straightforward. Let y = {yi,y2, • • ■ ,yd+i} be 
d + 1 points in general position. Denote the simplex formed by these d+1 points as S(y). 
(A simplex is the simplest polytope in M. d having d+1 vertices, d (d + 1) /2 edges and d+1 
faces of dimension (d — 1).) For r S [0, 1], define the r-factor central similarity proximity 
map as follows. Let ipj be the face opposite vertex yj for j = 1, 2, . . . , d + 1, and "face 
regions" R(ipi), . . . , R(ipd+i) partition S(y) into d+1 regions, namely the d + 1 polytopes 
with vertices being the center of mass together with d vertices chosen from d+1 vertices. 
For x G S{y) \ y, let <p(x) be the face in whose region x falls; x E R(ip(x)). (If x falls on 
the boundary of two face regions, we assign Jj?(x) arbitrarily.) For r £ (0, 1], the t -factor 



central similarity proximity region N^ s (x) = Ny(x) is denned to be the simplex S T (x) with 
the following properties: 

(i) S T (x) has a face <p T {x) parallel to <p(x) such that r d(x,(p(x)) = d(ip T (x),x) where 
d(x,ip(x)) is the Euclidean (perpendicular) distance from x to (p(x), 

(ii) S T (x) has the same orientation as and is similar to S(y), 

(iii) x is at the center of mass of S T (x). Note that r > 1 implies that x 6 Nq S (x). 

For r = 0, define N£ s (x) = {x} for all x G £(}>). 

Theorem 1 generalizes, so that any simplex 5 in K rf can be transformed into a regular 
polytope (with edges being equal in length and faces being equal in area) preserving unifor- 
mity. Delaunay triangulation becomes Delaunay tesselation in M. d , provided no more than 
d + 1 points being cospherical (lying on the boundary of the same sphere). In particular, 
with d = 3, the general simplex is a tetrahedron (4 vertices, 4 triangular faces and 6 edges), 
which can be mapped into a regular tetrahedron (4 faces are equilateral triangles) with 
vertices (0, 0, 0) (1, 0, 0) (1/2, x/3/2, 0), (1/2, y/3/6, y/G/3). 

Asymptotic normality of the [/-statistic and consistency of the tests hold for d > 2. 



6 Discussion and Conclusions 



In this article, we investigate the mathematical and statistical properties of a new proximity 
catch digraph (PCD) and its use in the analysis of spatial point patterns. The mathematical 
results are the detailed computations of means and variances of the [/-statistics under the 
null and alternative hypotheses. These statistics require keeping good track of the geometry 
of the relevant neighborhoods, and the complicated computations of integrals are done in 
th e symbolic computat ion package, MAPLE. The methodology is similar to the one given 
bv lCevhan et al.1 (j2006h . However, the results are simplified by deliberate choices we make. 
For example, among many possibilities, the proximity map is defined in such a way that 
the distribution of the domination number and relative density is geometry invariant for 
uniform data in triangles, which allows the calculations on the standard equilateral triangle 
rather than for each triangle separately. 

In various fields, there are many tests available for spatial point patterns. An extensive 
survey is provided by Kulldorff who enumerates m ore than 1 00 suc h tests, most of which 
need adjustment for some sort of inhomogeneity ( Kulldorfi ( 20061 )). He also provides a 
general framework to classi fy these tests. The most widely used tests include Pielou's test of 
segregation for two classes (jPieloul (119611)) due to its ease of computation and interpretation 
and Ripley's K(t) and L(t) functions |Ripley| (jl98ll )). 

The first proximity map similar to the r-factor proxi mity map N^, q in literature is 



the spherical proximity map Ng(x) := B(x,r(x)); see, e.g., iPriebe et al. ( 200ll ). A slight 
variation of N$ is the arc-slice proximit y map Nas(x) := B ( x,r(x )) D T{x) where T{x) is 
the Delaunay cell that contains x (see dCevhan and Priebel (SSl))). Furthermore, Cey- 
han and Priebe introduced the (unparametrized) central sim ilarity proximity map Ngo in 
( Ceyhan and Priebe ( 20031 )) and another family of PCDs in ( Ceyhan and Priebe ( 20051 )). 

The spherical proximity map Ns is used in classification in the literature, but not for 
testing spatial patterns between two or mor^ classes. We develop a technique to test the 



patterns of segregation or associatio n. There are many tests available for segregation and 
association in ecology literature. See dDixonl <|l994h ) for a survey on these tests and relevant 
references. Two of the most commonly used tests are Pielou's x 2 test of independence and 
Ripley's test based on K(t) and L(t) functions. However, the test we introduce here is not 
comparable to either of them. Our test is a conditional test — conditional on a realization of 
J (number of Delaunay triangles) and W (the set of relative areas of the Delaunay triangles) 
and we require the number of triangles J is fixed and relatively small compared to n = \X n \. 
Furthermore, our method deals with a slightly different type of data than most methods to 
examine spatial patterns. The sample size for one type of point (type X points) is much 
larger compared to the the other (type y points). This implies that in practice, y could be 
stationary or have much longer life span than members of X. For example, a special type of 
fungi might constitute X points, while the tree species around which the fungi grow might 
be viewed as the y points. 

The sampling structure for our asymptotic analysis is infill asymptotics (jCressid (|199lh ). 
Moreover, our statistic that can be written as a [/-statistic based on the locations of type X 
points with respect to type Y points. This is one advantage of the proposed method: most 
statistics for spatial patterns can not be written as [/-statistics. The [/-statistic form avails 
us the asymptotic normality, once the mean and variance is obtained by detailed geometric 
calculations. 

The null hypothesis we consider is considerably more restrictive than current approaches, 
which can be used much more generally. In particular, we consider the completely spatial 
randomness pattern on the convex hull of y points. 

Based on the asymptotic analysis and finite sample performance of relative density of 
r-factor central similarity PCD, we recommend large values of r (r < 1) should be used, 
regardless of the sample size for segregation. For association, we recommend large values of 
r (r < 1) for small to moderate sample sizes, and small values of r (r > 1). However, in a 
practical situation, we will not know the pattern in advance. So as an automatic data-based 
selection of r to test CSR against segregation or association, one can start with r = 1, and 
if the relative density is found to be smaller than that under CSR (which is suggestive of 
association), use any r G [.8,1.0] for small to moderate sample sizes (n < 200), and use 
r > (say r = .1) for large sample sizes n > 200. If the relative density is found to 
be larger than that under CSR (which is suggestive of segregation), then use large r (any 
r G [.8, 1.0]) regardless of the sample size. However, for large r (say, r G [.8, 1.0])), r = 1 
has more geometric appeal than the rest, so it can be used when large r is recommended. 

Although the statistical analysis and the mathematical properties related to the r-factor 
central similarity proximity catch digraph are done in M 2 , the extension to M. d with d > 2 is 
straightforward. Moreover, the geometry invariance, asymptotic normality of the [/-statistic 
and consistency of the tests hold for d > 2. 

Throughout the article, we avoid to provide a real life example, because the procedure 
in its current form ignores the X points outside the convex hull of Y points (which is 
referred as the boundary influence or edge effect in ecology literature). Furthermore, the 
spatial patterns of segregation and association are closely related to the pattern classification 
problem. These aspects are topics of ongoing research. 
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APPENDIX 



Proof of Theorem 1 

A composition of translation, rotation, reflections, and scaling will take any given tri- 
angle T = r(yi,y 2 ,y 3 ) to the "basic" triangle T b = T((0, 0), (1, 0), (a , c 2 )) with < 
ci < 1/2, c 2 > and (1 — ci) 2 + c 2 . < 1, preserving uniformity. The transformation 
4> e : M 2 — > M 2 given by <p e (u,v) = (^u + 1 ^ Cl v^j takes to the equilateral trian- 

gle T e = T((0,0), (1,0), (1/2, \/3/2)). Investigation of the Jacobian shows that 4> e also 
preserves uniformity. Furthermore, the composition of <fi e with the rigid motion transfor- 
mations maps the boundary of the original triangle T Q to the boundary of the equilateral 
triangle T e , the median lines of T Q to the median lines of T e , and lines parallel to the edges 
of T Q to lines parallel to the edges of T e and straight lines that cross T Q to the straight 
lines that cross T e . Since the joint distribution of any collection of the hy involves only 
probability content of unions and intersections of regions bounded by precisely such lines, 
and the probability content of such regions is preserved since uniformity is preserved, the 
desired result follows. ■ 



Derivation of /i(r) and z/(r) 

Let Mj be the midpoint of edge ej for j = 1,2,3, Mq be the center of mass, and T s := 
T( yi ,M 3 ,M c ). By symmetry /i(r) = P{X 2 € N^X^) = 6P(X 2 e N£ 3 (Xi), X 1 G T s ). 
Then 



P(X 2 eN^ s (X 1 ),X 1 eT s ) = [ [ 

Jo Jo 

= r 2 /36 



1/2 ptamix) 



A(T(y)) 2 y 



where A(N£ s (xi)) = 3^/3r 2 y 2 , A(T(y)) = y/S/A, and £ am {x) = x/V3. Hence (j,(t) = 

T 2 /Q. 

Next, we find the asymptotic variance term. Let 

P%N ■= P{{X2,X 3 } C JV5s(Xi)), Ph := P{{X 2 ,X 3 } C F^XuNZs)) and 

P M ■= P{X2 G N^ S (X 1 ),X 3 G r^NZs). 

where Fi(x, Nq S ) is the T i -region of x based o n and defined as T\(x, Nq S ) := {y G 
T(y) : x C N£ s (y)}. See |cevhan and Priebel l|2005l )) for more detail. 
Then Cov [h 12 , h 13 ] = E [h 12 h 13 ] - E [h 12 ]B [h 13 ] where 

V[hi 2 h 13 ] = P{{X 2 ,X 3 }cN^ s (X 1 ))+2P{X 2 eN^ s (X l ),X 3 eT 1 (X 1 ,N^ s )) 
+P{{X 2 ,X 3 } c TxiX^N^s)) = P 2 T N + 2Pl I + P 2 T G . 

Hence i/(r) = Cov [h 12 , h 13 ] = (P^ N + 2P T M + PJ G ) - [2^(r)] 2 . 

To find the covariance, we need to find the possible types of T\{x\,Nq S ) and Nq S (x\) 
for r G (0, 1]. There are four cases regarding T\{xi,Nq S ) and one case for N^ s {x\). See 



25 



Figure [HI for the prototypes of these four cases of Ti(xi, N£, s ) where, for (xi,yi) G T(y), 
the explicit forms of Q(t,x) are 



Ci(t,x 

(2(t,X 
(3(.T,X 

U(t,x 

(7(.T,X 



(V3yi + 3xi — 3x) 
V3(l + 2r) ' 
(— v3yi + 3a?i — 3x) 
V3(1 + 2t) ' 
(3xi + 3r — 3tx — 3x — V3yi) 
V3(-l + r) 
— r \/3 + r \/3x — 2 y\ 
2+^ ' 
r v3x + 2 yi 
2+7 ' 
(-3x — 3rx + 3xi + \/3yi) 



V3(l 



1 -r 



Each case j corresponds to the region Rj in Figure [T5| where 

1-r (x-l)(r-l) (l-r)x 

9l(x) = iW' q2(X)= V3(l + r) ' ^ = WOTT) 

The explicit forms of Rj, j = 1, . . . , 4 are as follows: 



and si = (1 - r)/2. 



fli = {(x,y) G [0,1/2] x [0,gs(x)]}, 

# 2 = {(x,y) G [0,si] x [9s(a0,4m(a0] U [*i,l/2] x [q3(x),q 2 (x)]}, 

R 3 = {(x,y) € [si, 1/2] x [q 2 (x), qi (x)]}, 

Ri = {(x,y) G [«i, 1/2] x [qi(x),£ am (x)]}. 

By symmetry, 

P{{X 2 ,X 3 } C NZsiXx)) = 6P({X 2 ,X 3 } C NZsiXj, X 1 G T s ), 



and 



1/2 A(N£ s ( Xl )) 2 A 



A(T(y)Y 



-dydx = r 4 /90, 



rl/Z 

P({X 2 , X 3 } C X 1 eT s )= / 

jo jo 

where A(iV£ s (xi)) = 3V3r 2 y 2 . Hence, 

P({X 2j X 3 }cJV5 5 (Xi)) =r 4 /15. 

Next, by symmetry, 

p{{x 2 ,x 3 } c ri(Xi,JV5 5 )) = 6Pgx 2 ,x 3 } c ri(Xi,jvs s ), *i g r s ) 



e 3 = e(x) 



e3 = e(x) 



Figure 14: The prototypes of the four cases of Ti(xi, N£ s ) for x\ G T(yi, M3, Mq) with 
T = 1/2. 



and 

i 



P{{X 2 ,X 3 } C r^X^NZs), X 1 e T s ) =Y J P{{X2,X 3 } C T^XuNZs), X l G R 3 ) . 
For x\ £ R\, 

P({X 2 ,X s }<zr 1 (X 1 ,NZ s ),X 1 eR 1 ) = ^^^t dydx 

r 4 (l~r) 
90(l + 2r) 2 (l + r) 5 ' 



where Afrfa.-Afo)) = 3 (r _ T 1} y ( ? r+1) ■ 
For xi G i?2, 



r« /■4»m(x)^ ri ( a . lj jyr s ^2 



v 2 ,*o ^Kiv^)g 



si 

r 5 (4r 6 + 6r 5 - 12r 4 - 21 r 3 + 14 r 2 + 40r + 20)(1 - r) 
45(2r + l^r + 2) 2 (r + l) 5 



0.25- 








vy 
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Figure 15: The regions corresponding to the prototypes of the four cases with r = 1/2. 



v. Air- ( t\tt \\ 3\/3(x 2 T+2V3xyT-y 2 T-x 2 +2V3xy-3y 2 )T 
Where A^fa, N^g)) = 4(1-t)(2t+1)(t+2) ' 

For x\ G i?3, 



/■1/2 r 

P({X 2 ,X 3 }cT 1 (X 1 ,N^ !S ), X^Rz) = / / 

J s i J 0' 



1/2 r^c^,^^ 



r 6 (l - r)(6r 6 - 35 r 4 + 130 r 2 + 160 r + 60) 
90(2t + 1) 2 (t + 2) 2 (t + 1) 5 ' 

where 

4 f ATT \\ _ 3V3{2x 2 T 2 +2y 2 T 2 -Ax 2 T-2xT 2 +Ay 2 r+2V3yT 2 +2x 2 +AxT+6y 2 +T 2 -2x~2V3y~2T+l)T 
A{L 1{X\, IV cs )) - 4(2r+l)(r-l)^(r+2) ' 

For xi S -R4, 

1/2 j.e am (x) AiT^xuN^s)) 2 



P{{X 2 ,X 3 } c riCXi,^), X l eR A )= I I 

J si J a 



r 6 (r 2 -5r + 10) 



1/2 ^(X!,^)^ 



A(T(J))3 



dydx 
dydx 



15(2r + l) 2 (r + 2) s 



where A(ri(xi, iV£ 5 )) 
So 



v / 3(3a 2 +3i/ 2 -3a'-v / 3</-r+l)7 
2(2t+1)(t+2) 



P^islcr!^,^)) =6 



(t 2 -7t-2)t 4 



(r 2 -7r-2)r 4 



90 (r +2^)(2r + l)(r + 2) J 15 (r + l)(2r + l)(r + 2) ' 



Furthermore, by symmetry, 



p(x 2 g n^ s (x 1 ), x 3 g r^XuNZs)) = 

6P{X 2 G iV£ s (*i), X 3 G T^X^NZs), X 1 G T a ), 



and 



P{X 2 G ^(Xi), X 3 G ri(Xi,JV5 5 ), Xx G T s ) 

4 

= ^P(X 2 G iV5 s (Xi), X 3 G ri(Xi,JV5 s ), X x G 
i=i 

where P(X 2 G Nq S (Xx), X 3 G ripfi, JV£ s ), Xi G iij) can be calculated with the 
same region of integration with integrand being replaced by A ( N cs( x ^)^^^ cl > N cs)) _ 
Then 

V( Y e= AT? f Y \ Y c V ( Y AT? \\ - R ( ( 2 ^ 4 " 3 t 3 -4t 2 +10 t+4)t 4 \ _ (2 r 4 -3 r 3 -4 r 2 +10 r+4)r 4 
^(A 2 G 7V cs ,(Aij, X 3 G ll(Ai,iV cs jj - O ^ 180(2r+l)(r+2) ) ~ 30 (2 r+l)(r+2) ' 

Hence 

v tu u i r 4 (2r 5 -t 4 -5t 3 + 12t 2 + 28t + 8) 
E [/l12 71131 " 15(r+l)(2r+l)(r + 2) " 

Therefore, 

_ r 4 (6 r 5 - 3 r 4 - 25 r 3 + t 2 + 49 r + 14) 
~ 45(r + l)(2r+l)(r + 2) ' 

For r = 0, it is trivial to see that z/(r) = 0. 

Sketch of Proof of Theorem 3 

Under the alternatives, i.e. e > , p n { T ) is a [/-statistic with the same symmetric kernel 
hij as in the null case. The mean fj,s(r,e) = E £ [p n (r)] = E e [/ii 2 ]/2 (and /j,a(t,s)), now 
a function of both r and e, is again in [0,1]. vs{r,e) = Cov £ [/ii 2 , /ii 3 ] (and i^(r,e)), 
also a function of both r and e, is bounded above by 1/4, as before. Thus asymptotic 
normality obtains provided that ug(r,e) > (1/4(1-, e) > 0); otherwise p n {r) is degenerate. 
The explicit forms of hs(t,e) and /j,a(t,e) are given, defined piecewise, in the Appendix. 
Note that under , 



(r,e) > for (r,e) G ((0,l] x (0, 3 v^/io] ) [j ( ( 2 ^_ JL e) ,1 



3\/3/10, V3/3 



and under iJ^, 



Ka(t,£) > for (r,e) G (0, l] x (0,\/3/3j. 
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Sketch of Proof of Theorem 4 



Since the variance of the asymptotically normal test statistic, under both the null and the 
alternatives, converges to as n — > oo (or is degenerate), it remains to show that the mean 
under the null, h(t) = E [pn( r )]> is less than (greater than) the mean under the alternative, 
/^(t, e) = E e [p n (r)] (/x^(r, e)) against segregation (association) for e > 0. Whence it will 
follow that power converges to 1 as n — ► oo. 

It is possible, albeit tedious, to compute ^s(r, e) and lla(t, e) under the tw o alternatives. 
The calculations are deferred to the technical report by Cevhan et al. ( 20041 ) due to its ex- 



treme length and technicality, but the resulting explicit forms are provided in the Appendix. 
Detailed analysis of /Us(r, e) and /j-a(,t, e) indicates that under segregation /is(r, e) > /i(r) 
for all e > and r G (0,1]. Likewise, detailed analysis of /m(t, e) indicates that under 
association ^a(t, e) < fj,(r) for all e > and r G (0,1]. We direct the reader to the tech- 
nical report for the details of the calculations. Hence the desired result follows for both 
alternatives. ■ 

Proof of Corollary 1 

In the multiple triangle case, 

Mr, J) = E \p n {r)\ = —±— £ £ E [ hij ] = 

V ' i<3 

^E [h 12 ] = E [I(A 12 )} = P(A 12 ) = P(X 2 G NZsiXj) . 

But, by definition of N£t S (-), X 2 g" Nq S (Xi) a.s. if X\ and X 2 are in different triangles. So 
by the law of total probability 

J 

Mr, J) := P{X 2 eN£ s (X 1 ))=J2P{X2£NZ s (X 1 )\{X 1 ,X 2 }cT j )P({X 1 ,X 2 }cT j ) 
J 

= ^/x(r)P({Xi,X 2 } C Tj) (since P(X 2 G iV£ s (Xi) | {X U X 2 } C I}) = /x(r)) 

3=1 

J 

= /x(r) Y,{AFj)IA{C H {y))f (since P({*i,X 2 } C 2}) = (A{T 3 ) / A{C H {y))f) 

3=1 

Letting := A(Tj)/A(Cii(y)), we get //(r, J) = /z(r) • (X)/=i w ]) where /i(r) is given by 
Equation ©. 

Furthermore, the asymptotic variance is 

i/(r,J) = E [hiz h 13 ] - E [h 12 ]B [his] 

= P{{X 2 ,X 3 } C N£ S (X 1 ))+2P(X 2 G NZ s (Xi),X 3 G JV£ 5 )) 
+P ({X 2 , X 3 } C r x (X, , jV£ 5 )) - 4 ( M ( T , J)) 2 . 
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Then for J > 1, we have 

J 

P{{X 2 ,X 3 } C JVSsCXx)) = ]TP({X 2 ,X 3 } C 7V- S (X 1 )|{X 1 ,X 2 ,X 3 } C T j )P({X 1 ,X 2 ,X 3 } C T,) 

= Y, p ™{MTj)/A(c H (y))) 3 = p; N [J2 w n- 



Similarly, P(X 2 G jV^^), X 3 G r^N^g)) = P T M {Ej=i^) andP({X 2 ,X 3 } C 
r 1 (X 1 ,NS s )) = Ph (E/=i^ 3 ), hence, v(t, J) = (P 2 ^ + 2P^ + PJ G ) fe/ =1 



4/i(r, J) 2 = i/(t) (E/=i wf) +4 Ai(r) 2 f £/ =1 " (E/=i ™f) ) , so conditional on W, if 
i/(r, J) > then (pn(r) - Jl{r)) AT(0, u(t, J)). 
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